Skip to content

NMS-19648: CircleCI: Store failed result of flaky tests, improve a test case#8404

Open
mershad-manesh wants to merge 38 commits intodevelopfrom
mem/improvements
Open

NMS-19648: CircleCI: Store failed result of flaky tests, improve a test case#8404
mershad-manesh wants to merge 38 commits intodevelopfrom
mem/improvements

Conversation

@mershad-manesh
Copy link
Copy Markdown
Contributor

All Contributors

External References

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on improving CI reliability and debuggability around flaky tests by preserving failure evidence across retries, refining CircleCI smoke/itest distribution, and stabilizing a few UI component tests.

Changes:

  • Update CircleCI smoke/itest scripts to preserve failing JUnit XMLs as “flaky evidence” before retrying, and improve test splitting/logging.
  • Introduce a quarantined smoke-test-flaky CircleCI job/suite and adjust existing smoke job parallelism/config behavior.
  • Stabilize several Vue/Vitest UI tests (dialog stubbing, cleanup improvements, and debounce handling) and harden Selenium clicking via retry/obscured-element checks.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ui/tests/components/EventConfigurationDetail/EventConfigEventTable.test.ts Adds dialog mocking/cleanup and switches debounce verification to flush() approach.
ui/tests/components/EventConfiguration/EventConfigSourceTable.test.ts Adds stubs for dialogs, improves teardown, and updates debounce tests to use flush().
ui/tests/components/EventConfiguration/Dialog/UploadedFileRenameDialog.test.ts Mocks FeatherDialog and improves teardown by closing dialog before unmount.
ui/tests/components/EventConfiguration/Dialog/CreateEventConfigurationDialog.test.ts Mocks FeatherDialog and adjusts teardown/visibility handling to reduce flakiness.
ui/tests/components/EventConfigEventCreate/BasicInformation.test.ts Ensures fake timers are always restored via try/finally and adds afterEach timer reset.
smoke-test/src/test/java/org/opennms/smoketest/GrafanaEndpointPageIT.java Uses clickElement(By...) helper to reduce click flakiness.
smoke-test/src/main/java/org/opennms/smoketest/ui/framework/Toggle.java Switches to retrying click helper for toggles.
smoke-test/src/main/java/org/opennms/smoketest/ui/framework/Element.java Adds clickWithRetry with scroll + obscured-element detection.
smoke-test/src/main/java/org/opennms/smoketest/ui/framework/Button.java Switches to retrying click helper for buttons.
smoke-test/src/main/java/org/opennms/smoketest/selenium/AbstractOpenNMSSeleniumHelper.java Enhances clickElement with scroll + obscured-element detection.
.circleci/scripts/smoke.sh Filters ITs by suite category, simplifies docker pre-pull, and preserves failing XMLs before retries.
.circleci/scripts/itest.sh Adds node summary logging, preserves failing XMLs before retries, and increases db create threads.
.circleci/scripts/find-tests/git.py Fails fast with a clearer error when .nightly lacks parent_branch.
.circleci/scripts/find-tests/find-tests.py Fixes shebang and removes noisy debug printing.
.circleci/pyscripts/generate_main.py Reduces workflow max_auto_reruns.
.circleci/main/workflows/workflows_v2.json Adds workflow entry for quarantined flaky smoke tests.
.circleci/main/jobs/tests/smoke/smoke-test.index Registers the new smoke-test-flaky job.
.circleci/main/jobs/tests/smoke/smoke-test-sentinel.yml Increases sentinel smoke parallelism.
.circleci/main/jobs/tests/smoke/smoke-test-minion.yml Increases minion smoke parallelism.
.circleci/main/jobs/tests/smoke/smoke-test-flaky.yml Adds a quarantined flaky smoke job (allow failures, retries).
.circleci/main/jobs/tarball-assembly-only.yml Adjusts resource defaults and Maven heap settings.
.circleci/main/commands/oci/trivy-analyze.yml Removes a redundant/incorrect step from the Trivy analyze command.
.circleci/main/commands/generic/generic.yml Avoids repeated apt installs by checking package presence first.
.circleci/main/commands/executions/run-smoke-tests.yml Adds allow-failures mode and collects/stores flaky-evidence artifacts.
.circleci/main/commands/executions/run-integration-tests.yml Collects/stores flaky-evidence artifacts for integration tests.
.circleci/main/commands/executions/run-build.yml Ensures integration tests are skipped for the build-only compile step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cgorantla
cgorantla previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Contributor

@cgorantla cgorantla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +20 to +40
| xargs grep -L 'MinionTests\|SentinelTests\|FlakyTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
;;
minion)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs grep -l 'MinionTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
;;
sentinel)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs grep -l 'SentinelTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
;;
flaky)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs grep -l 'FlakyTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_tests() pipes circleci tests glob into xargs grep .... When grep -l/-L finds no matches (or when the glob yields no files), grep exits non-zero and, with set -e -o pipefail, this will abort the script even though an empty test list should be valid. Consider guarding this stage (e.g., use xargs -r and/or wrap the grep stage with || true in a subshell) so failsafe_classnames can be empty without failing the job.

Suggested change
| xargs grep -L 'MinionTests\|SentinelTests\|FlakyTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
;;
minion)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs grep -l 'MinionTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
;;
sentinel)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs grep -l 'SentinelTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
;;
flaky)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs grep -l 'FlakyTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames
| xargs -r grep -L 'MinionTests\|SentinelTests\|FlakyTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames || true
;;
minion)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs -r grep -l 'MinionTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames || true
;;
sentinel)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs -r grep -l 'SentinelTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames || true
;;
flaky)
circleci tests glob '**/src/test/java/**/*IT*.java' \
| xargs -r grep -l 'FlakyTests' \
| sed -e 's#^.*src/test/java/\(.*\)\.java#\1#' | tr "/" "." \
> failsafe_classnames || true

Copilot uses AI. Check for mistakes.
// Before debounce time
it('does not call store immediately on input (debounce)', () => {
wrapper.vm.onChangeSearchTerm('test')
expect(store.onChangeEventsSearchTerm).not.toHaveBeenCalled()
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debounce test calls onChangeSearchTerm('test') but never flushes/cancels the debounced function. With real timers enabled, the pending debounce may fire after the test completes and bleed into subsequent tests (mock call counts/state). Cancel (or flush) the debounced function at the end of this test or in afterEach (e.g., onChangeSearchTerm.cancel()).

Suggested change
expect(store.onChangeEventsSearchTerm).not.toHaveBeenCalled()
expect(store.onChangeEventsSearchTerm).not.toHaveBeenCalled()
wrapper.vm.onChangeSearchTerm.cancel()

Copilot uses AI. Check for mistakes.
cgorantla
cgorantla previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Contributor

@cgorantla cgorantla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@synqotik synqotik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just one change to make.

element);

return result instanceof Boolean && (Boolean)result;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this (Copilot comment), isCenterPointObscured should only appear once. I'm thinking make it a public static function in Element, which would actually look more like the implementation here. The Javascript checks if the element is null so you don't have to do that explicitly in the Java code.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +150 to 154
| xargs -I{} cp {} "${FLAKY_EVIDENCE_DIR}/"
# Now delete originals so fresh results are written by the retry
find . \( -path "*/failsafe-reports/TEST-*.xml" -o -path "*/surefire-reports/TEST-*.xml" \) \
-exec grep -l -E 'failures="[1-9]|errors="[1-9]' {} + 2>/dev/null \
| xargs rm -f
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cp/rm pipelines use xargs without -r/--no-run-if-empty, which can still invoke cp/rm with no arguments when no failing XMLs are found (leading to noisy errors, even though set +e prevents failure). Consider adding xargs -r (and optionally -0 with find -print0) to make this robust and quiet.

Suggested change
| xargs -I{} cp {} "${FLAKY_EVIDENCE_DIR}/"
# Now delete originals so fresh results are written by the retry
find . \( -path "*/failsafe-reports/TEST-*.xml" -o -path "*/surefire-reports/TEST-*.xml" \) \
-exec grep -l -E 'failures="[1-9]|errors="[1-9]' {} + 2>/dev/null \
| xargs rm -f
| xargs -r -I{} cp {} "${FLAKY_EVIDENCE_DIR}/"
# Now delete originals so fresh results are written by the retry
find . \( -path "*/failsafe-reports/TEST-*.xml" -o -path "*/surefire-reports/TEST-*.xml" \) \
-exec grep -l -E 'failures="[1-9]|errors="[1-9]' {} + 2>/dev/null \
| xargs -r rm -f

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +99 to +103
final WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10), Duration.ofMillis(200));
wait.ignoring(StaleElementReferenceException.class).until(webDriver -> {
final WebElement element = webDriver.findElement(by);
((JavascriptExecutor)webDriver).executeScript("arguments[0].scrollIntoView({block: 'center', inline: 'center'});", element);

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clickWithRetry uses webDriver.findElement(by) inside WebDriverWait.until(...) but does not ignore NoSuchElementException. Since implicit wait is set to 0, a briefly-missing element will throw immediately and abort the wait instead of retrying, which can reintroduce flakiness. Consider adding NoSuchElementException to the ignored exceptions (and/or using an ExpectedCondition that returns false/null when the element is absent).

Copilot uses AI. Check for mistakes.
mershad-manesh and others added 3 commits April 9, 2026 12:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants